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This  project  has  Investigated  the  costs  Imposed  by  the  need  for  communi¬ 
cation  between  user  and  machine  and  between  components  of  the  machine  In  solving 
data-processlng  problems.  There  have  been  three  principal  subtasks.  The  first, 
exploration  of  the  minimal  costs  of  storing  and  accessing  Information  In  simple 
data  structures.  Is  the  oldest  and  results  have  been  reported  In  several  pub¬ 
lications.  The  second  is  the  design  of  minimax  optimal  universal  codeword  sets 
which  can  be  used  to  represent  any  message  set  efficiently  by  assigning  messages 
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kin  order  of  decreasing  probability  to  codewords  in  order  of  increasing  length. 
A  manuscript  giving  results  of  this  research  is  almost  ready  for  publication. 
The  third  is  the  exploration  of  switching  networks  which  can  be  used  for 
communication  between  a  number  of  processors  engaged  in  a  common  computational 
task:  that  work  is  still  in  an  early  stage  and  is  not  yet  ready  for 
publication.  A 
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BODY 

When  computers  carry  out  interactive  data  processing  tasks,  communication  is 
necessarily  involved.  Commands  must  be  communicated  from  the  user  to  the 
processor,  results  must  be  communicated  from  the  processor  to  the  user,  and  in  the 
course  of  carrying  out  the  computation  there  must  be  communication  between  the 
processor  and  the  memory  hierarchy.  If  multiple  processors  are  involved  in  carrying 
out  portions  of  a  common  task  then  they  must  communicate  with  one  another.  The 
basic  objective  of  the  research  carried  out  in  this  project  has  been  to  explore  the 
contribution  of  these  communications  requirements  to  the  difficulty  of  carrying  out 
data-processing  tasks. 

The  storage  and  retrieval  of  information  presents  a  class  of  problems  for  which 
such  an  investigation  is  especially  appropriate,  since  the  costs  of  a  storage  and 
retrieval  problem  are  largely  the  costs  of  communication  between  processor  and 
memory  and  the  costs  of  storing  information  in  memory.  Information  theory  provides 
results  and  techniques  which  are  appropriate  for  investigating  such  problems,  and 
the  principal  published  work  resulting  from  this  research  has  been  an  analysis  of  the 
trading  relations  between  storage  and  access  costs  for  best- possible 
representations  of  simple  data  structures.  A  review  paper  by  Elias  [1],  reporting 
work  performed  under  a  previous  ARO  contract,  gives  results  in  special  cases.  The 
doctoral  thesis  of  Donna  Brown,  issued  in  [2]  as  a  report  of  the  Laboratory  for 
Computer  Science,  explores  trading  relations  for  linear  data  structures-  lists,  stacks 
and  queues-  in  detail.  Those  and  related  results  have  been  published  by  Brown  in 
[3]  and  [4],  with  acknowledgement  of  ARO  support  under  this  contract,  and  another 
publication  by  Brown  is  in  preparation.  So  is  a  publication  by  Elias,  which 
generalizes  the  results  in  [1]. 

One  obvious  application  of  information  theory  to  storage  and  retrieval  problems 
concerns  the  choice  of  representation  for  data  values.  An  entry  in  a  data  base  which 
is  selected  from  a  fixed  set  can  be  assigned  a  codeword  which  is  a  sequence  of 
symbols,  and  if  the  set  of  possible  values  is  large  and  values  occur  with  different 
frequencies  then  Huffman  encoding  can  be  used  to  reduce  the  average  amount  of 
storage  space  required  by  using  the  shortest  codewords  for  the  most  frequent 
values.  A  difficulty  with  this  approach  is  that  in  data  processing  the  exact  knowledge 
of  frequencies  of  different  values  which  is  assumed  in  information  theory  and  is 
available  for  English  text  may  not  be  available  for  a  given  application.  A  second 
difficulty  is  that  a  different  codebook  must  be  consulted  in  looking  up  each  value 
selected  from  a  different  set.  Both  of  these  problems  are  alleviated  by  using  a 
universal  codeword  set,  as  introduced  in  [5].  A  universal  set  of  codewords  is  defined 
to  have  the  property  that  if  messages  in  a  message  set  are  assigned  in  order  of 
decreasing  probability  to  codewords  in  order  of  increasing  length,  the  ratio  of  the 
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resulting  average  codeword  length  to  the  length  of  a  best  possible  (Huffman) 
encoding  of  that  source  is  bounded  by  a  constant  for  all  message  sets  whose 
entropy  is  neither  zero  nor  infinite.  In  research  reported  in  [5]  and  supported  by  a 
predecessor  ARO  contract  I  copnstructed  infinite  universal  codeword  sets  and 
showed  that  the  ratio  of  the  average  codeword  length  for  such  a  set  to  the  average 
length  of  a  Huffman  code  was  bounded  by  3  for  the  worst  possible  probability 
distribution.  However  that  bound  was  not  shown  to  be  the  actual  value  of  the  ratio  in 
the  worst  case  (which  turns  out  to  be  2,  not  3)  nor  was  that  set  of  codewords  shown 
to  minimize  the  maximum  of  that  ratio  over  all  message  sets.  More  recent  research 
by  Rissanen  [6]  and  by  Davisson  and  Leon-Garcia  [7]  used  different  measures  of 
performance  than  that  ratio,  and  restricted  their  attention  to  finite  rather  than  infinite 
codeword  sets,  but  were  able  to  find  codeword  sets  which  were  minimax  optimal  by 
their  measures  on  the  sets  of  message  probability  distributions  they  considered.  In 
recent  work  [8]  now  being  prepared  for  publication  I  have  found  fast  algorithms  for 
the  design  of  minimax  optimal  universal  codeword  sets  by  the  ratio  cost  measure: 
the  average  codeword  length  for  such  a  code  is  at  most  253/160  ~  1.58  times  the 
average  codeword  length  for  a  Huffman  code  for  the  worst  message  distribution. 

One  other  topic  has  also  been  explored,  by  a  graduate  student  Andrew  Boughton 
who  has  been  supported  in  part  under  this  contract.  Boughton  has  investigated  the 
communications  problems  involved  in  using  a  large  number  of  processors  in  parallel 
to  solve  a  single  problem.  The  obvious  way  of  making  it  possible  to  connect  the 
output  of  one  processor  to  the  input  to  any  other  is  a  crossbar,  which  takes  a 
number  of  elements  growing  like  the  square  of  n  to  connect  n  processors,  which 
becomes  expensive  for  large  n.  Other  connection  network  techniques  are 
informationally  more  efficient  and  require  only  nlogn  elements  to  connect  n  devices. 
However  such  networks  may  require  long  wires,  and  thus  both  significantly  greater 
delay  and  significantly  greater  implementation  cost  when  built  as  integrated  circuits, 
in  which  wire  is  as  expensive  as  devices.  This  work  is  also  not  yet  ready  for 
publication,  but  Boughton’s  doctoral  thesis  proposal  will  be  complete  shortly. 
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